92 research outputs found
On-line learning of language models with word error probability distributions
We are interested in the problem of learning stochastic language models on-line (without speech transcriptions) for adaptive speech recognition and understanding. In this paper we propose an algorithm to adapt to variations in the language model distributions based on the speech input only and without its true transcription. The on-line probability estimate is defined as a function of the prior and word error distributions. We show the effectiveness of word-lattice based error probability distributions in terms of Receiver Operating Characteristics (ROC) curves and word accuracy. We apply the new estimates Padapt(w) to the task of adapting on-line an initial large vocabulary trigram language model and show improvement in word accuracy with respect to the baseline speech recognizer. 1
Non-native children speech recognition through transfer learning
This work deals with non-native children's speech and investigates both
multi-task and transfer learning approaches to adapt a multi-language Deep
Neural Network (DNN) to speakers, specifically children, learning a foreign
language. The application scenario is characterized by young students learning
English and German and reading sentences in these second-languages, as well as
in their mother language. The paper analyzes and discusses techniques for
training effective DNN-based acoustic models starting from children native
speech and performing adaptation with limited non-native audio material. A
multi-lingual model is adopted as baseline, where a common phonetic lexicon,
defined in terms of the units of the International Phonetic Alphabet (IPA), is
shared across the three languages at hand (Italian, German and English); DNN
adaptation methods based on transfer learning are evaluated on significant
non-native evaluation sets. Results show that the resulting non-native models
allow a significant improvement with respect to a mono-lingual system adapted
to speakers of the target language
Automatic assessment of spoken language proficiency of non-native children
This paper describes technology developed to automatically grade Italian
students (ages 9-16) on their English and German spoken language proficiency.
The students' spoken answers are first transcribed by an automatic speech
recognition (ASR) system and then scored using a feedforward neural network
(NN) that processes features extracted from the automatic transcriptions.
In-domain acoustic models, employing deep neural networks (DNNs), are derived
by adapting the parameters of an original out of domain DNN
Evaluation of automatic transcription systems for the judicial domain
This paper describes two different automatic transcription systems
developed for judicial application domains for the Polish and Italian
languages. The judicial domain requires to cope with several factors
which are known to be critical for automatic speech recognition, such
as: background noise, reverberation, spontaneous and accented speech,
overlapped speech, cross channel effects, etc.
The two automatic speech recognition (ASR) systems have been developed
independently starting from out-of-domain data and, then, they have
been adapted to the judicial domain using a certain amount of
in-domain audio and text data.
The ASR performance have been measured on audio data acquired in the
courtrooms of Naples and Wroclaw. The resulting word error rates are
around 40%, for Italian, and around between 30% and 50% for Polish.
This performance, similar to that reported for other comparable ASR
tasks (e.g. meeting transcriptions with distant microphone), suggests
that possible applications can address tasks such as indexing and/or
information retrieval in multimedia documents recorded during judicial
debates
EnetCollect in Italy
In this paper, we present the enetCollect COST Action, a large network project, which aims at initiating a new Research and Innovation (R&I) trend on combining the well-established domain of language learning with recent and successful crowdsourcing approaches. We introduce its objectives, and describe its organization. We then present the Italian network members and detail their research interests within enetCollect. Finally, we report on its progression so far.In questo articolo presentiamo la COST Action enetCollect, un ampio network il cui scopo è avviare un nuovo filone di Ricerca e Innovazione (R&I) combinando l’ambito consolidato dell’apprendimento delle lingue con i più recenti e riusciti approcci di crowdsourcing. Introduciamo i suoi obiettivi e descriviamo la sua organizzazione. Inoltre, presentiamo i membri italiani ed i loro interessi di ricerca all’interno di enetCollect. Infine, descriviamo lo stato di avanzamento finora raggiunto
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
Word Spotting - Work in Progress
This report describes word spotting activity carried out at IRST. Performance is evaluated for different artificial tasks defined within the APASCI database. The base system makes use of the HMM framework developed in the last years at IRST. In particular, some parameters like the number of keywords, their length, and the use of different filler models (acoustical, lexical, syntactical) are investigated.
Experiments show, as expected, that the length of the keywords seems to be the most crucial factor, i.e. short keywords are more difficult to spot than longer ones. Moreover, it is observed that the choice of the filler models strongly influences performanc
- …